Antipattern: Always Depend on One’s Parent

Let's take a look at some drawbacks of maintaining trees with an Adjacency List.

Let’s imagine a query in which there is a need to retrieve the parents or children of a node. What should be our solution to such a query?

The answer is Adjacency List.

Any solution that we would find for this problem would include modifying the code that manages the tree data structures. Let’s discuss it in detail.

Adjacency list#

The solution we might first think of, and that is commonly shown in books and articles, is to add a column parent_id. This column references another comment in the same table, and the user can create a foreign key constraint to enforce this relationship.

This design is called Adjacency List. It’s probably the most common design software developers use to store hierarchical data.

The SQL to define this table is presented below.

Creating comments table

The Adjacency List Entity-Relationship Diagram is shown below.

Adjacency List Entity-Relationship Diagram

An illustration of the tree, “Threaded comments,” is shown below.

Threaded comments illustration

Querying a tree to find the immediate children of a node#

Adjacency List fails to be a solution for one of the most common tasks we need to do with a tree: query all descendants.

We can retrieve a comment and its immediate children using a relatively simple query:

Retrieving a comment and its immediate children

In the results, we will see the data for comment on the left side and the information of its immediate child or children on the right side.

However, this queries only two levels of the tree. One characteristic of a tree is that it can extend to any depth, so we need to be able to query the descendants without regard to the number of levels. For example, we may need to compute the COUNT() of comments in the thread or the SUM() of the cost of parts in a mechanical assembly.

This kind of query is awkward when we use Adjacency List because each level of the tree corresponds to another JOIN, and the number of JOINs in an SQL query must be fixed.

Querying a tree to find the descendants of a node#

The following query retrieves a tree of up to four levels of depth but cannot retrieve the tree beyond that.

Retrieving ancestors of a node for a depth up to four levels

Note: There are no records for the fourth level, so we are seeing NULL in all those cells.

This query is also awkward because it includes descendants from progressively deeper levels by adding more columns. This makes it hard to compute an aggregate such as COUNT().

Synopsis: Naive Trees
Maintaining a Tree with an Adjacency List
Mark as Completed
Report an Issue